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Abstract 

We previously showed that close relatives of human coronavirus (HCoV)-229E exist in 
African bats. The small sample and limited genomic characterizations prevented further 
analyses so far. Here, we tested 2,087 fecal specimens from 11 bat species sampled in Ghana 
for HCoV-229E-related viruses by RT-PCR. Only hipposiderid bats tested positive. To 
compare the genetic diversity of bat viruses and HCoV-229E, we tested historical isolates and 
diagnostic specimens sampled globally over 10 years. Bat viruses were five- to sixfold more 
diversified than HCoV-229E in RNA-dependent RNA polymerase (RdRp) and Spike genes. In 
phylogenetic analyses, HCoV-229E strains were monophyletic and not intermixed with 
animal viruses. Bat viruses formed three large clades in close and more distant sister 
relationship. A recently described 229E-related alpaca virus occupied an intermediate 
phylogenetic position between bat and human viruses. According to taxonomic criteria, 
human, alpaca and bat viruses form a single CoV species showing evidence for multiple 
recombination events. HCoV-229E and the alpaca virus showed a major deletion in the Spike 
S1 region compared to all bat viruses. Analyses of four full genomes from 229E-related bat 
CoVs revealed an eighth open reading frame (ORF8) located at the genomic 3’-end. ORF8 
also existed in the 229E-related alpaca virus. Re-analysis of HCoV-229E sequences showed a 
conserved transcription regulatory sequence preceding remnants of this ORF, suggesting its 
loss after acquisition of a 229E-related CoV by humans. These data suggested an evolutionary 
origin of 229E-related CoVs in hipposiderid bats, hypothetically with camelids as 


intermediate hosts preceding the establishment of HCoV-229E. 


Importance 
The ancestral origins of major human coronaviruses (HCoV) likely involve bat hosts. Here, 
we provide conclusive genetic evidence for an evolutionary origin of the common cold virus 


HCoV-229E in hipposiderid bats by analyzing a large sample of African bats and 
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characterizing several bat viruses on a full genome level. Our evolutionary analyses show that 
animal and human viruses are genetically closely related, can exchange genetic material and 
form a single viral species. We show that the putative host switches leading to the formation 
of HCoV-229E were accompanied by major genomic changes including deletions in the viral 
spike glycoprotein gene and loss of an open reading frame. We re-analyze a previously 
described genetically related alpaca virus and discuss the role of camelids as potential 
intermediate hosts between bat and human viruses. The evolutionary history of HCoV-229E 
likely shares important characteristics with that of the recently emerged highly pathogenic 


MERS-Coronavirus. 
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Introduction 

Coronaviruses (CoV) are enveloped viruses with a single-stranded, positive-sense contiguous 
RNA genome of up to 32 kilobases. The subfamily Coronavirinae contains four genera 
termed Alpha-, Beta-, Gamma- and Deltacoronavirus. Mammals are predominantly infected 
by alpha- and betacoronaviruses, while gamma- and deltacoronaviruses mainly infect avian 


hosts (1, 2). 


Four human coronaviruses (HCoVs) termed HCoV-229E, -NL63, -OC43 and -HKU1 
circulate in the human population and mostly cause mild respiratory disease (3). HCoV-229E 
is frequently detected in up to 15% of specimens taken from individuals with respiratory 
disease (4-6). Although HCoV-229E can be detected in fecal specimens, HCoVs generally 
don’t seem to play a role in acute gastroenteritis (7-9). Severe respiratory disease with high 
case-fatality rates is caused by severe acute respiratory syndrome (SARS)-CoV and Middle 
East respiratory syndrome (MERS)-CoV which emerged recently. HCoV-229E and HCoV- 
NL63 belong to the genus Alphacoronavirus, while HCoV-OC43, HCoV-HKU1, SARS- and 


MERS-CoV belong to the genus Betacoronavirus (1, 10). 


In analogy to major human pathogens including Ebola virus, rabies virus, mumps virus and 
hepatitis B and C viruses (11-16), the evolutionary origins of SARS- and MERS-CoV were 
traced back to bats (17-22). The genetic diversity of bat CoVs described over the last decade 
exceeds the diversity in other mammalian hosts (2). This has led to speculations on an 
evolutionary origin of all mammalian CoVs in bat hosts (23). Bats share important ecological 
features potentially facilitating virus maintenance and transmission, such as close contact 


within large social groups, longevity, and the ability of flight (13, 24). 
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How humans become exposed to remote wildlife viruses is not always clear (25). Human 
infection with SARS-CoV and MERS-CoV was likely mediated by peri-domestic animals. 
For SARS-CoV, the suspected source of infection were carnivores (26). Preliminary evidence 
suggested that these carnivore hosts may also have adapted SARS-CoV for human infection 
(27). For MERS-CoV, camelids are likely intermediate hosts, supported by circulation of 
MERS-CoV in camel herds globally and for prolonged periods of time (28-30). Whether 
MERS-CoV only recently acquired the capacity to infect humans in camelids is unclear. 

The evolutionary origins of HCoV-229E are uncertain. In 2007, a syndrome of severe 
respiratory disease and sudden death was recognized in captive alpacas from the U.S. (31) and 
an alphacoronavirus genetically closely related to HCoV-229E was identified as the causative 
agent (32). 

In 2009, we detected viruses in fecal specimens from 5 of 75 hipposiderid bats from Ghana 
and showed that these bat viruses were genetically related to HCoV-229E by characterizing 
their partial RNA-dependent RNA polymerase (RdRp) and Nucleocapsid genes (33). Lack of 
specimens containing high CoV RNA concentrations so far prevented a more comprehensive 
characterization of those bat viruses to further address their relatedness to HCoV-229E. Here, 
we tested more than 2,000 bats from Ghana for CoVs related to HCoV-229E. We describe 
highly diversified bat viruses on a full genome level and analyze the evolutionary history of 


HCoV-229E and the genetically related alpaca CoV. 
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Materials and Methods 

Bat and human sampling 

Bats were caught in the Ashanti region, central Ghana, during 2009-2011 as described 
previously (21). Archived anonymized respiratory specimens derived from patients sampled 
between 2002-2011 were obtained from Hong Kong/China, Germany, The Netherlands, 


Brazil and Ghana. 


RNA purification, coronavirus detection and characterization 

RNA was purified from approximately 20 mg of fecal material suspended in 500 pL 
RNAlater stabilizing solution using the MagNA Pure 96 system (Roche Penzberg, Germany). 
Elution volumes were 100 pL. Testing for CoV RNA was done using a real time RT-PCR 
assay designed to allow detection of HCoV-229E and all genetically related bat CoVs known 
from our pilot study (33). Oligonucleotide sequences were CoV229Elike-F 13948m 
TCYAGAGAGGTKGTTGTTACWAAYCT, CoV229Elike-P13990m FAM (6- 
Carboxyfuorescein)-TGGCMACTTAATAAGTTTGGIAARGCYGG-BHQ1 (Black Hole 
Quencher 1) and CoV229Elike-R14138m CGYTCYTTRCCAGAWATGGCRTA. Testing 
used the SSIII RT-PCR Kit (Life Technologies, Karlsruhe, Germany) with the following 
cycling protocol in a LightCycler 480 (Roche, Penzberg, Germany): 20 min. at 50 °C for 
reverse transcription, followed by 3 min. at 95 °C and 45 cycles of 15 sec. at 95 °C, 10 sec. at 
58 °C and 20 sec. at 72 °C. CoV quantification relied on cRNA in vitro transcripts generated 
from TA-cloned peri-amplicons using the T7-driven Megascript (Life technologies, 
Heidelberg, Germany) kit as described previously (34). Partial RdRp gene sequences from 
real time RT-PCR-positive specimens were obtained as described previously (18). Full CoV 
genomes and Spike gene sequences were generated for those specimens containing highest 
CoV RNA concentrations using sets of nested RT-PCR assays (primers available upon 


request) located along the HCoV-229E genome and designed to amplify small sequence 
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islets. Sequence islets were connected by bridging long-range nested PCR using strain- 
specific primers (available upon request) and the Expand High Fidelity kit (Roche) on cDNA 


templates generated with the Superscript III reverse transcriptase (Life Technologies). 


Phylogenetic analyses 

Bayesian phylogenetic reconstructions were made using MrBayes V3.1 (35) under 
assumption of a GTR+G+I nucleotide substitution model for partial RdRp sequences and the 
WAG amino acid substitution model for translated open reading frames (ORFs). Two million 
generations were sampled every 100 steps, corresponding to 20,000 trees of which 25% were 
discarded as burn-in before annotation using TreeAnnotator V1.5 and visualization using 
FigTree V1.4 from the BEAST package (36). Neighbor-joining phylogenetic reconstructions 
were made using MEGAS.2 (37) and a percentage nucleotide distance model, the complete 
deletion option and 1,000 bootstrap replicates. Genome comparisons were made using 
MEGAS.2 (37); SSE V1.1 (38) and recombination analyses were made using SimPlot V3.5 


(39). 
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Results 

Specimens from 2,087 bats belonging to 11 species were available for PCR testing. Table 1 
provides details on the overall sample composition and detection rates in individual bat 
species. Only bats belonging to the family Hipposideridae tested positive in 81 of 1,853 
specimens (4.4%). All positive-testing bats had been morphologically identified in the field as 
either Hipposideros cf. ruber or H. abae. Those were the most abundant species within the 
sample. No HCoV-229E-related RNA was detected in the 17 available specimens from H. 


jonesi and H. cf. gigas. 


An 816 nucleotide (nt) fragment from the RdRp gene was obtained from 41 of the 81 positive 
specimens (GenBank accession nos. KT253259-KT253299). This fragment was used for 
further analysis as the 816 nt sequence yields improved resolution in inference of phylogeny 
as compared to shorter sequences derived from RT-PCR screening of field-derived samples 
(2). To expand the available genomic data for HCoV-229E, the 816 nt RdRp fragment was 
also sequenced from 23 HCoV-229E strains from patients sampled between 2002-2011 in 
China, Germany, The Netherlands, Brazil, and Ghana. In addition, the 816 nt RdRp fragment 
was sequenced from two historical HCoV-229E strains isolated in 1965 and the 1980ies (40) 
(GenBank accession nos. KT253300-KT253323). In analogy to the official taxonomic 
designation SARS-related CoV including human SARS-CoV and related CoVs from other 
animals (1), we hereafter restrict usage of the term HCoV-229E to the human virus and refer 
to the animal viruses as 229E-related CoV. Figure 1A shows a Bayesian phylogeny of the 
partial RdRp gene. The bat virus diversity we observed in our pilot study (represented by 
viruses Buoyem344 and Kwamang19) was expanded greatly. A phylogenetically basal virus 
termed Kwamang8 obtained within our pilot study was not detected again, although the 
present study contained specimens from the same cave and bat species. All human strains 


occupied an apical phylogenetic position and were not intermixed with any of the animal 
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viruses. The recently described alpaca 229E-related CoV (32) clustered with two viruses 
obtained from hipposiderid bats in a parallel study from our groups in the Central African 
country Gabon (41). The two Gabonese bat-associated viruses differed from the alpaca 229E- 
related CoV by only 3.2% nucleotide content within the RdRp fragment. Hipposiderid bat 
CoVs were neither sorted by sampling sites, nor by their host species in their RdRp genes. 
Overall, bat 229E-related CoVs sampled over 3 years differed up to 13.5% in their nt and 
3.3% in their amino acid (aa) sequences. Although the HCoV-229E dataset used for 
comparison was sampled over 50 years, the human-associated viruses showed 5-10fold less 
genetic diversity than bat viruses with only 1.4% nt and 0.7% aa variation. Because of the 
small sequence variation in HCoV-229E, Figure 1A contains only nine representative HCoV- 
229E strains. The neighbor-joining phylogeny shown in Figure 1B represents the high 


sequence identity between all HCoV-229E strains determined in this study. 


To analyze to which extent bat 229E-related CoV show genetic variation, the Spike gene 
encoding the viral glycoprotein was characterized from 15 representative bat viruses (labeled 
with a triangle in Figure 1A). Figure LC shows a Bayesian phylogenetic tree of the bat 229E- 
related CoV Spike gene sequences and HCoV-229E full Spike sequences sampled over 50 
years. The bat viruses formed three genetically diverse lineage, of which two phylogenetically 
basal lineages contained bat viruses only. These lineages were sorted according to their 
sampling sites Kwamang (abbreviated KW) and Akpafu Todzi (abbreviated AT). A third 
lineage contained closely related bat viruses obtained from three different sample sites 
separated by several hundred kilometers (Buoyem, Kwamang and Forikrom) (21). These data 
suggested co-circulation of different Spike gene lineages within sampling sites as well as the 
existence of separate lineages between sites. However, the small number of viruses 
characterized from the phylogenetically basal bat clades 1 and 2 implies that caution should 


be taken in assertions on geographically separated Spike gene lineages. The alpaca 229E- 
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related CoV and all HCoV-229E strains clustered in apical phylogenetic position compared to 
the bat viruses. The most closely related bat viruses from lineage | differed from HCoV-229E 
by 8.4-13.7%. The two other bat virus lineages were less related to HCoV-229E with 30.6- 


33.0% aa sequence distance. 


Topologies of the Bayesian phylogenetic reconstructions of RdRp and Spike genes from bats 
and the alpaca were not congruent, compatible with past recombination events across animal 
229E-related CoVs. The high similarity of the RdRp gene of human HCoV-229E strains did 
not allow comparisons of the RdRp-based with the Spike-based topology. To further 
investigate the genomic relationships of bat 229E-related CoVs and HCoV-229E, the full 
genomes were determined directly from fecal specimens from four representative bat viruses 
(labeled with circles in Figures 1A and C). Figure 2A shows that bat 229E-related CoV 
genomes comprise 28,014-28,748 nt, which exceeds the length of known HCoV-229E strains 
by 844-1,479 nt. As shown in Figure 2B, HCoV-229E and all bat viruses were closely related 
within the putative ORF lab. This allowed the delineation of non-structural proteins (nsp) 1- 
16 for all bat viruses in analogy to HCoV-229E. Table 2 provides details on length and 
cleavage sites of the predicted nsp. Sequence identity in seven concatenated nsp is used by the 
International Committee for the Taxonomy of Viruses (ICTV) for CoV species designation 
(1). As shown in Table 3, the four fully sequenced bat viruses showed translated aa sequence 
identities of 93.3-97.1% with HCoV-229E. This was well above the 90% threshold 
established by the ICTV, indicating all bat 229E-related CoVs and HCoV-229E form a single 
species. Bat virus Kwamang8, which formed a phylogenetically basal sister-clade to the other 
bat viruses and HCoV-229E, could not be sequenced on a full genome level. The aa sequence 
of the partial RdRp gene of Kwamang8 differed by only 3.3% from other bat viruses and 
HCoV-229E. Based upon previous comparisons of CoV RdRp sequences for tentative species 


delineation (2, 18), Kwamang8 forms part of the same species as the other bat viruses and 
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HCoV-229E. This CoV species would also include the recently described alpaca 229E-related 
CoV (32), which showed 96.9-97.2% aa sequence identity with HCoV-229E and 94.2-97.8% 


with the bat viruses in the seven concatenated nsp domains. 


As shown in Figure 2A, all seven open reading frames (ORFs) known from HCoV-229E 
were found in bat 229E-related CoVs in the sequence ORF 1a/1b-Spike-ORF4-Envelope- 
Membrane-Nucleocapsid. Amino acid identities between predicted ORFs of the bat viruses 
and HCoV-229E ranged from the 67.2-91.6% described above for the translated Spike genes 
to 88.3-94.6% (ORF 1ab), with bat virus lineage 1 consistently showing highest aa sequence 
identities. Table 4 provides details for all sequence comparisons. 

We looked for additional support for the existence of these predicted ORFs by analyzing the 
sequence context at their 5’-termini. This is because in CoVs, ORFs are typically preceded by 
highly conserved transcription regulatory sequence (TRS) elements (42). All putative ORFs 
from bat-229E related CoVs showed high conservation of the typical HCoV-229E TRS core 
sequence UCU C/A AACU and adjacent bases. Table 5 provides details on all putative TRS 


elements within bat 229E-related CoV genomes. 


Figure 3A shows Bayesian phylogenetic trees reconstructed for all individual ORFs. The 
alpaca 229E-related CoV clustered in intermediate position between HCoV-229E and the bat 
viruses in the ORF 1ab and Spike, but with bat viruses only in Membrane, Envelope, 
Nucleocapsid, and ORF4. The divergent topologies again suggested recombination events in 
229E-related CoVs. To find further evidence for recombination events and identify genomic 
breakpoints, 229E-related CoVs were analyzed by bootscanning. As shown in Figure 3B, 
bootscanning supported multiple recombination events involving HCoV-229E, bat 229E- 
related CoVs and the alpaca 229E-related CoV. Major recombination breakpoints occurred 


within the ORF 1ab and the beginning of the Spike gene, compatible with previous analyses of 
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CoV recombination patterns (2) and the divergent topologies between the RdRp and Spike 
genes noted above. Bootscanning also suggested a potential genomic breakpoint within the 
Spike gene, mapping to the borders of the S1 (associated with for receptor binding) and $2 
domains (associated with membrane fusion). This would be consistent with previous evidence 
supporting intra-Spike recombination events in bat-associated CoVs (43). To obtain further 
support for potential intra-Spike recombination events, separate phylogenetic reconstructions 
for the S1 and the S2 domains were made. As shown in Figure 3B, these phylogenetic 
reconstructions supported recombination events involving the alpaca 229E-related CoV and 
HCoV-229E, but not the bat 22E-related CoVs. In the S1 domain, the alpaca 229E-related 
CoV clustered with clinical HCoV-229E strains, while the HCoV-229E reference strain inf-1 
isolated in 1962 clustered in phylogenetically basal sister relationship. Only in the $2 domain, 
the intermediate position of the alpaca compared to bat and human 229E-related CoVs noted 
before in comparisons of the full Spike was maintained. These data may hint at recombination 
events between HCoV-229E and the alpaca virus and further supported genetic compatibility 


between these two viruses belonging to one CoV species. 


Three major differences existed between HCoV-229E, the alpaca 229E-related CoV and the 
bat 229E-related CoVs. The first of these differences occurred in the putative ORF4. Similar 
to HCoV-229E strains characterized from clinical specimens, a contiguous ORF4 existed in 
all bat viruses that was 156-164 aa residues longer than the alpaca 229E-related CoV ORF4. 
Re-analysis of the putative ORF4 sequence of the alpaca 229E-related CoV showed that this 
apparently shorter ORF4 was due to an insertion of a single cytosine residue at position 181. 
Without this putative insertion, the alpaca 229E-related CoV ORF4 showed the same length 
as homologous ORFs in bat 229E-related CoVs and HCoV-229E. Since the HCoV-229E 
ORF4 is known to accumulate mutations in cell culture (40), the apparently truncated ORF in 


the alpaca 229E-related CoV isolate may thus not occur in vivo. The extended ORF4 of the 
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alpaca 229E-related CoV would be most closely related to bat viruses from clade 1 with 5.5% 


aa sequence distance, compared to at least 8.8% distance from HCoV-229E strains. 


The second difference was a considerably longer S1 portion of the bat 229E-related CoV 
Spike genes compared to HCoV-229E. Figure 4 shows that the three bat lineages contained 
185-404 additional aa residues upstream of the putative receptor binding domain (44, 45) 
compared to HCoV-229E. Bat lineage 1 which was phylogenetically most closely related to 
HCoV-229E carried the smallest number of additional aa residues. Of note, the alpaca 229E- 
related CoV was identical to HCoV-229E in the number of aa residues within this region of 


the Spike gene. 


The third major difference was the existence of an additional putative ORF downstream of the 
Nucleocapsid gene in all bat viruses. Non-homologous ORFs of unknown function 
downstream the Nucleocapsid occur in several alpha- and betacoronaviruses, including Feline 
infectious peritonitis virus (FIPV), Transmissible gastroenteritis virus of swine (TGEV), 
Rhinolophus bat CoV HKU2, Scotophilus bat CoV 512, Miniopterus bat CoV HKU8 (23), the 
Chaerephon bat CoVs BtKY22/BtKY41, the Cardioderma bat CoV BtKY43 (46) and bat 
CoV HKU10 from Chinese Hipposideros and Rousettus species (47). In the genus 
Betacoronavirus, only Bat CoV HKU9 from Rousettus and the genetically related Eidolon bat 
CoV BtK Y24 (46) carry additional ORFs at this genomic position. No ORF in the 3’-terminal 
genome region is known from HCoV-229E. The alpaca 229E-related CoV contains an ORF at 
this position termed ORFX by Crossley et al. (32). In analogy to consecutive numbers used to 
identify HCoV-229E ORFs, we refer to this ORF as ORF8 hereafter. The putative TRS 
context preceding ORF8 was conserved in all bat 229E-related CoV and in the alpaca 229E- 
related CoV, suggesting that a corresponding subgenomic mRNA8 may exist. The 3’-UTR of 


bat 229E-related CoVs immediately followed the putative ORF8. This was supported by the 
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existence of a conserved octanucleotide sequence and highly conserved stem elements 
forming part of the pseudo-knot typically located at the 5’-end of alphacoronavirus 3’-UTRs 
(48). As shown in Figure 5, HCoV-229E shows a high degree of sequence conservation 
compared to bat 229E-related CoVs and the alpaca 229E-related CoV in this genomic region, 
including a highly conserved putative TRS. Bioinformatic analyses (49-51) provided evidence 
for the presence of two transmembrane domains in the predicted proteins 8 of the alpaca and 
the genetically related bat 229E-related viruses. This may imply a role of the predicted protein 
8 in coronaviral interactions with cellular or viral membranes. 

As shown in Figure 5, one of the bat 229E-related CoV lineages represented by virus KW2E- 
F56 contained a highly divergent ORF8. In protein BLAST comparisons, the KW2E-F56 
ORF8 showed limited similarity to the putative ORF 7b of HKU 10 and to the putative ORF8 
located upstream the Nucleocapsid of a Nigerian Hipposideros betacoronavirus termed Zaria 
bat CoV (47, 52). This may hint at cross-genus recombination events between different 
hipposiderid bat CoVs in the past. However, overall aa sequence identity between these bat 
CoV ORFs was very low with maximally 28.2%. As shown in Figure 6, only the central part 
of these ORFs contained a stretch of 46 more conserved aa residues showing up to 39.1% 
sequence identity and 47.8% similarity (Blosum62 matrix). The origin and function of the 


divergent ORFS thus remain to be determined. 


15 


0) 
= 
6 
oO 

(0) 

— 

(72) 

O 
jae 

— 

(OL. 
= 

O 

72) 

=) 

(= 
= 
TO 

v 

(ok 

(0) 

O 

O 
~ 


Journal of Virology 


Journal of Virology 


352 


353 


354 


355 


356 


357 


358 


359 


360 


361 


362 


363 


364 


365 


366 


367 


368 


369 


370 


371 


372 


373 


374 


375 


376 


Discussion 

We characterize highly diverse bat CoVs on a full genome level and show that these viruses 
form one species together with HCoV-229E and a recently described virus from alpacas (32). 
We analyze the genomic differences between human, bat and alpaca 229E-related CoVs to 


elucidate potential host transitions during the formation of HCoV-229E. 


A major difference between bat 229E-related CoVs and HCoV-229E was the Spike deletion 
in HCoV-229E compared to the bat viruses. Interestingly, the bat 229E-related CoV lineage | 
which was phylogenetically most related to HCoV-229E also carried the smallest number of 
additional aa residues. Most chiropteran CoVs are restricted to the gastrointestinal tract, 
whereas HCoVs mainly replicate in the respiratory tract (2). The Spike deletion in HCoV- 
229E compared to ancestral bat viruses is thus noteworthy, since deletions in this protein have 
been associated with changes in coronaviral tissue tropism. This is best illustrated by TGEV, 
whose full-length Spike variants are associated with a dual tropism for respiratory and enteric 
tract, whereas the deleted variant termed porcine respiratory CoV (PRCV) mainly replicates 
in the respiratory tract (53). One could hypothesize that adaptation of bat 229E-related CoV 
lineage | to both non-chiropteran hosts and to respiratory transmission may have been easier 
compared to the other bat 229E-related CoV lineages. 

Because the exact aa residues of the HCoV-229E RBD conveying cell entry are not known, it 
is difficult to predict whether the bat viruses may interact with the HCoV-229E cellular 
receptor Aminopeptidase N (45) or its Hipposideros homologue. Characterization of this bat 
molecule and identification of permissive cell culture systems may allow initial susceptibility 
experiments for chimeric viruses. Of note, although the alpaca 229E-related CoV was 
successfully isolated (32), no data on receptor usage and cellular tropism are available so far 


(2, 53). 
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Another major difference was the existence of an ORF8 downstream the Nucleocapsid gene 
in bat 229E-related viruses and the detection of putative sequence remnants of this ORF in 
HCoV-229E. Hypothetically, deterioration of ORF8 in HCoV-229E could have occurred due 
to loss of gene function in human hosts after zoonotic transmission from bats or intermediate 
hosts. This may parallel gradual deletions in the SARS-CoV accessory ORF8 during the 
human epidemic compared to bat SARS-related CoVs (54) and is consistent with 
characterizations of HCoV-229E clinical strains showing high variability of this genomic 


region (55). 


The virus-host association between 229E-related CoVs and the bat genus Hipposideros is 
strengthened by our virus detections in Hipposideros species in Ghana and in Gabon (41), 
which is separated from Ghana by about 1,800 km. The observed link between 229E-related 
alphacoronaviruses and hipposiderid bats is paralleled by the detections of genetically closely 
related betacoronaviruses in different Hipposideros species from Ghana, Nigeria, Thailand 
and Gabon (33, 41, 52, 56), suggesting restriction of these CoVs to hipposiderid bat genera. 
Due to their proofreading capacity, CoVs show evolutionary rates of 10E-5 to 10E-6 
substitutions per site per replication cycle, which is much slower than rates observed for other 
RNA viruses (57, 58). Our data thus suggest a long evolutionary history of 229E-related 
CoVs in Old World hipposiderid bats that greatly exceeds that of HCoV-229E in humans, 


confirming previous hypotheses from our group (33). 


The putative role of the alpaca 229E-related CoV in the formation of HCoV-229E is unclear. 
Our data enable new insights into the evolutionary history of HCoV-229E. First, the alpaca 
229E-related CoV contained an intact ORF8 which was genetically related to the homologous 
gene in bat 229E-related CoVs. Second, genes of the alpaca CoV clustered either with bat 


viruses only or in intermediate position between bat viruses and HCoV-229E. Because the 
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alpaca 229E-related CoV showed the same deletion in its Spike gene as HCoV-229E 
compared to bat 229E-related CoVs, it may be possible that alpacas represent a first host 
switch from bats followed by a second inter-host transfer from alpacas to humans. The 
relatedness of the alpaca 229E-related CoV to older HCoV-229E strains rather than to 
contemporary ones reported by Crossley et al. would be compatible with this scenario (32). 
However, the alpaca 229E-related CoV was reported only from captive animals in the U.S. 
and whether this virus is indeed endemic in New World alpacas is unclear. Additionally, the 
apparent intra-Spike recombination event may speak against a role of the alpaca virus as the 
direct ancestor of HCoV-229E. Further analyses will be required to confirm this putative 
recombination event, ideally including additional sequence information from old HCoV-229E 
strains. Furthermore, a hypothetical direct transfer of Old World bat viruses to New World 
alpacas appears geographically unfeasible. It would be highly relevant to investigate Old 
World camelids for 229E-related CoVs that may have been passed on to captive alpacas and 
that may represent direct ancestors of HCoV-229E. 

Additional constraints to consider in the hypothetical role of camelids for the evolutionary 
history of 229E-related CoVs is the time and place of putative host switches from bats. 
Camels were likely introduced to Africa not earlier than 5,000 years ago from the Arabian 
Peninsula (59, 60) and could not possibly come into direct contact with West African H. cf. 
ruber or H. abae of the Guinean savanna. The majority of CoV species seems to be confined 
to host genera (2). Therefore, it may be possible that 229E-related CoV transmission was 
mediated through closely related species like H. tephrus, which occurs in the Sahel zone and 
comes into contact to populations of H. cf. ruber distantly related to those from the Guinean 
savanna (61). This bat species should be analyzed for 229E-related CoVs together with other 
genera of the family Hipposideridae, like Asellia or Triaenops, which are desert-adapted bats 
sharing their habitat with camelids both in Arabia and Africa and may harbor genetically 


related CoVs. An important parallel to this evolutionary scenario is the role of camelids for 
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the emerging MERS-CoV (30, 62), whose likely ancestors also occur in bats (20, 21). 
However, we cannot rule out that the alpaca 229E-related CoV and HCoV-229E represent 
two independent zoonotic acquisitions from 229E-related CoVs existing in hipposiderid bats 


and potentially yet unknown intermediate hosts. 


The existence of different serotypes in the expanded 229E-related CoV species is unclear. 
CoV neutralization is mainly determined by antibodies against the S protein, and particularly 
the S1 domain (63). The phylogenetic relatedness of the S1 domains from the alpaca 229E- 
related CoV and HCoV-229E suggests that these viruses form one serotype. The most closely 
related bat 229E-related CoV lineage showed 8.4% aa sequence distance in the translated 
Spike gene from HCoV-229E. This was comparable to the 7.8-18.6% aa distance between 
FIPV, TGEV und canine CoV, which belong to one CoV species (Alphacoronavirus 1) and 
for which cross-neutralization was observed (64). The about 30% Spike aa sequence distance 
between the other bat 229E-related lineages and HCoV-229E were comparable to the distance 
between HCoV-NL63 and HCoV-229E, which form two different serotypes (65). HCoV- 
229E thus likely forms one serotype that includes the alpaca 229E- and potentially the most 
closely related bat 229E-related lineage, while the other bat 229E-related lineages may form 
different serotypes. In our study, lack of bat sera and absence of bat 229E-related CoV 
isolates prevented serological investigations. The generation of pseudotyped viruses carrying 
bat 229E-related Spike motifs may allow future serological studies. Of note, our joint analyses 
of Ghanaian patients with respiratory disease in this study and previous work from our group 
investigating Ghanaian villagers (66) showed that Ghanaians were infected with the globally 
circulating HCoV-229E, whereas no evidence of bat 229E-related CoV infecting humans was 
found. If serotypes existed in 229E-related CoVs, serologic studies may thus aid to elucidate 


putative exposure of humans and potential camelid intermediate hosts to these bat viruses. 
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It should be noted that throughout Africa, bats are consumed as wild game (67) and humans 
frequently live in close proximity of bat caves (68), including usage of bat guano as fertilizer 
and drinking water from these caves (21). These settings potentially facilitate the exposure of 
humans and their peri-domestic animals, including camelids, to these previously remote bat 
viruses. 

In summary, HCoV-229E may be a paradigmatic example of the successful introduction of a 


bat CoV into the human population, possibly with camelids as intermediate hosts. 
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Figure legends 

Figure 1. Phylogenetic relationships of the genus Alphacoronavirus, HCoV-229E strains 
and the novel bat viruses 

A, Bayesian phylogeny of an 816 nucleotide RdRp gene sequence fragment corresponding to 
positions 13,891-14,705 in HCoV-229E prototype strain inf-1 (GenBank accession no. 
NC002645) using a GTR+G+I substitution model. SARS-coronavirus (CoV) was used as an 
outgroup. Viruses with additional sequence information generated in this study were marked 
with circles (full genome) or marked with triangles (Spike gene). Bat viruses detected in our 
previous studies from Ghana (33) and Gabon are given in cyan (41). B, Neighbour-joining 
phylogeny of the same RdRp gene fragment with a nucleotide percentage distance substitution 
model and the complete deletion option. The tree was rooted against HCoV-NL63. Viruses 
were coloured according to their origin. C. Bayesian phylogeny of the full Spike gene of bat 
229E-related CoVs, the alpaca 229E-related CoV and HCoV-229E strains identified with 
GenBank accession numbers and year of isolation, using a WAG amino acid substitution 
model and HCoV-NL63 as an outgroup. The novel bat 229E-related CoVs are shown in 
boldface and red. Branches leading to the outgroup were truncated for graphical reasons as 
indicated by slashed lines. Values at nodes show support of grouping from posterior 


probabilities or 1,000 bootstrap replicates (only values above 0.7 were shown). 


Figure 2. Genome organization of 229E-related coronaviruses and relationships between 
viruses from bats and humans 

A, 229E-related CoV genomes represented by black lines; ORFs are indicated by grey arrows. 
Locations of transcription-regulatory core sequences (TRS) are marked by black dots. HCoV- 
NL63 is shown for comparison. B, Similarity plots generated using SSE V1.1 (38) using a 
sliding window of 400 and a step size of 40 nucleotides (nt). The HCoV-229E prototype 


strain inf-1 was used with animal viruses identified in the legend. 
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Figure 3. Bayesian phylogenies of major open reading frames and recombination 
analysis of HCoV-229E and related animal viruses 

A, Phylogenies were calculated with a WAG amino acid substitution model. The novel bat 
viruses are shown in red. The alpaca CoV is shown in cyan. Filled circles, posterior 
probability support exceeding 0.95, scale bar corresponds to genetic distance. Details on the 
origin of HCoV-229E strain VFC408 which was generated for this study can be retrieved 
from (69). Branches leading the outgroup HCoV-NL63 were truncated for graphical reasons. 
B, Bootscan analysis using the Jukes-Cantor algorithm with a sliding window of 1,500 and a 
step size of 300 nt. The HCoV-220E inf-1 strain was used with animal 229E-related viruses as 
identified in the legend. C. Phylogenies of the $1 and S2 subunit were calculated according to 
A. One representative HCoV-229E strain was selected per decade according to (70); GenBank 


accession nos. DQ243974, DQ243964, DQ243984, DQ243967. 


Figure 4. Amino acid sequence alignment of the 5’-end of the Spike gene of HCoV-229E 
and related animal viruses 

Amino acid alignment of the first part of the Spike gene of 229E-related CoVs including four 
bat 229E-related CoVs, the alpaca 229E-related CoV and the HCoV-229E inf-1 strain. 
Conserved amino acid residues are marked in black, sequence gaps are represented by 


hyphens. 


Figure 5. Nucleotide sequence alignment of the genomic 3’-end of HCoV-229E and 
related animal viruses 

Nucleotide alignment of the genome region downstream the Nucleocapsid gene including 
four bat 229E-related CoV, the alpaca 229E-related CoV and representative HCoV-229E full 


genomes identified with GenBank accession number or strain name. Dots represent identical 
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nucleotides, hyphens represent sequence gaps. Grey bars above alignments indicate open 
reading frames and the beginning of the poly-A tail. The putative start and stop codon of 
ORF is labelled lime green, the corresponding putative TRS element is marked blue. The 
conserved genomic sequence elements and the highly conserved stem elements forming part 


of the pseudo-knot (PK) were marked with grey and purple background. 


Figure 6. Amino acid sequence alignment of the putative ORF8 from a bat 229E-related 
coronavirus and closest hits from two other hipposiderid bat coronaviruses 

Conserved amino acid residues between sequence pairs are highlighted in color according to 
amino acid properties, sequence gaps are represented by hyphens. The central domain 
showing higher sequence similarity between compared viruses is boxed for clarity. The 229E- 
related alphacoronavirus KW2E-F56 from a Hipposideros cf. ruber detected in this study is 
given in red, the alphacoronavirus HKU 10 originated from a Chinese H. pomona, the 


betacoronavirus Zaria originated from a Nigerian H. gigas. 
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_ Nucleocapsid ~ 

ieee ee ee eae 
HCoV-229E/inf-1 TGATGAAGT. 
HCoV-229E/JX503061 ......... 
HCoV-229E/21050349 ......... 
Alpaca CoV —__........ . see CATCCAGAAG 
KW2E-F151 —— ......... CATCCCGAAGATGAACAGGCCCCCCTTTGGCAAG 
FOUASF2-0 00 sila tgedtarane’@ CATCCCGAAGATGAACAGGCCCCCCTTTGGCAAG 
ATUA=FL 4 2 2 2 2 bie Bie ee CATCCTGAAGATCAAGCTGCCCCCCTCTGGAGTG 
KW2E-F56 GGTGGTGGTGATTCTCTGCTTTCTTGTGGTTGGGAGTTTTTGCTTACCATTGAAAGAGGTAAACACCCATCGCGTTA 

oe ee 
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KW2E-F151 TTTTGTTTGGAATGTTGTTAGTTGAGGGTTCTATGACTTGGCTAATGG..............00005 CATTCAATTGTTAGTTGCTGCTAGTTCAGATGG 
FO1A-F2 TTTTGTTTGGAATGTTGTTAGTTGAGGGTTCTATGACTTGGCTAATGG..............2.-2- CATTCAATTGTTAGTTGCTGCTAGTTCAGATGG 
ATIA-F1 TTCTTTTTGGATTTGTGCTTGTTGAAGGCGGTATAATTTGGATCACCG. .GT...A..A........ TTTTCAGTTGTTGGTAGCTGCTTTTACAGATGG 
KW2E-F56 CTGCTTATAAGCAGATATTATGCGACTCA TATACTAGTTTAGGAC .TTG. ..CACTAGCCC. . .AATGCTTACCAACACCATGATACTGTTCAATGG 
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Table 1. Overview of bats tested for 229E-related coronaviruses in Ghana 


Species n Positives (%) 
Coleura afra 68 0 
Hipposideros abae 242 19 (7.8) 
H. cf. gigas 12 0 

H. jonesi 5 0 

H. cf. ruber 1611 62 (3.8) 
Nycteris cf. gambiensis 91 0 
Rhinolophus alcyone 4 0 

R. landeri o 0 
Taphozous perforatus 21 0 
Lissonycteris angolensis 20 0 
Rousettus aegyptiacus 4 0 

Total 2,087 81 (3.9) 
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Table 2. Coding capacity for the putative non-structural proteins of the novel bat 229E-related 


coronaviruses 


KW2E-F151 FO1A-F2 ATIA-FI KW2E-F56 


I" to last amino acid Proteinsize —-1“'to lastaminoacid ——Proteinsize = 1*tolastaminoacid —Proteinsize 1" to last aminoacid _ Protein size 
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NSP1 Met!-Gly'!! 11 Met!-Gly'!! 111 Met!-Gly'!! ual Met'!-Gly'” 109 
NSP2 Asn'?-Gly*” 786 Asn!?-Gly*” 786 Asn'?-Gly*”” 786 Asn'!°-Gly** 786 
NSP3 Gly**-Ala*4* 1597 Gly**-Ala’4* 1597 Gly**-Ala?? 1595 Gly**-Ala4? 1594 
NSP4 Gly”?-GIn?”* 481 Gly™8-GIn??”* 481 Gly?-Gin? 481 Gly*°-GIn?” 481 
NSPS Ala””*-Gin?”” 302 Ala?”’-GIn?” 302 Ala?™*-GIn?”> 302 Ala?”!-GIn?”? 302 
NSP6 Ser?*Gin*** 279 Ser?Gin** 279 Ser??*Gin*? 278 Ser??_Gin***! 279 
NSP7 Ser*”-Gin**” 83 Ser**”-Gin**? 83 Ser*_Gin** 83 Ser*°*_Gin**# 83 

NSP8 Ser”_Gin'*4 195 Ser”_Gin'** 195 Ser7_Gin**! 195 Ser**_Gin*®” 195 
NSP9 Asn**°-Gin?? 109 Asn**°-GIn?*? 109 Asn**7-GIn?” 109 Asn*°-GIn?§ 109 
NSP10 Ala?4Gin#0” 135 Ala®?*4-Gin*” 135 Ala*™4!-GIn” 135 Ala??-GIn¥3 135 
NSPI1 Ser“””-Glu” 19 Ser“””-Glu'” 19 Ser“Giu™* 19 Ser”™-Giu”” 19 

NSP12 Ser“?_Gin* 927 Ser“?-Gin* 927 Ser“Gin” 927 Ser”™Gin™ 927 
NSP13 Ala®-Gin? 597 Ala®?’-GIn 597 Ala®?-Gin” 597 Ala°"!-GIn*”” 597 
NSP14 Ser™*-Gin'!? 518 Ser*?-Gin!”? 518 Ser*".Gin"!!” 518 Ser***-Gin"!§ 518 
NSP15 Gly®!?!-Gins* 348 Gly®!?!-Ginss 348 Gly*!!8-GIn™*s 348 Gly"!'°GIn™® 348 
NSP16 Ser°-Lyso 300 Ser°_Lyso7 300 Ser _Lys°7 301 Ser Lys°7 300 
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Table 3. Comparison of amino acid identities of seven conserved replicase domains of the bat 229E-related 


coronaviruses, HCoV-229E and the alpaca 229E-related coronavirus for species delineation 


Percentage amino acid sequence identity 


Human Coronavirus 229E’ vs. 


© 
= 
C 
ao) 
D 
= 
2) 
O 
{as 
1s 
= 
1S) 
2) 
=) 
Cc 
= 
ao) 
2 
om 
D 
O 
\S) 
— 


Domains within ACoV‘ vs 
Bat KW2E-F56 ATIA-F1 KW2?-F151 FOIA-F2 b 
a Bat 229E! 
229E 
ADRP 75.6-100 75-75.6 91.1-92.9 84.5-85.1 84.5-85.1 76.8-90.5 
NSP5(3CLpro) 90.7-100 90.4-90.7 97.4-97.7 96.4-96.7 97.4-97.7 90.4-97.4 
NSP12 (RdRp) 97.5-100 95.7-96 97.3-97.6 96.9-97.3 97.2-97.7 97.3-98.9 
NSP13 (NTPase/Hel) 97.2-100 96.5-97.2 97.2-97.8 97.3-98 98-98.7 97.8-99.3 
NSP 14 (ExoN/N7-MTase) 96.1-100 95-95.6 97.5-98.1 97.3-97.9 96.9-97.5 96.3-99.2 
NSP15 (NendoU) 92.8-100 92.2 96.3-96.6 96.6-96.8 96.8-97.1 91.4-96.8 
NSP16 (O-MT) 91.7-100 90.7-91 91.7-92 97.3-97 97.3-97.7 90.7 — 98.0 
Concatenated domains 94.5-100 93.3-93.6 96.4-96.8 96.4-96.7 96.7-97.1 94.2-97.8 


“including - HCoV 229E - Inf-1, HCoV 229E - 0349, HCoV 229E - J0304; 
‘including - Bat CoV KW2E-F56, ATIA-F1, KW2E-F151 and F01A-F2; 
“ACoV - Alpaca Coronavirus 


GenBank accession numbers of reference sequences: HCoV-229E - Inf-1: NC_002645.1; HCoV-229E - 0349: JX503060; HCoV-299E - J0304: JX503061; 
Alpaca CoV (ACoV): JQ410000 
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Table 4. Amino acid identity between open reading frames of human, bat and camelid 


229-related coronaviruses 


Percentage Amino Acid Sequence Identity 


Human Coronavirus 229E® vs. 


KW2E-FIS1  FOIA-F2. ATIA-FI_—- KW2E-F56 AcoV within Bats <ACOY vs Bat 


Cov? Cov? 

ORF 1a 89.5 - 89.9 89.5 - 89.8 92.6 - 93.1 84.1 - 84.6 92.9 - 93.3 83.8 - 97.9 85.1 - 93.5 
ORF 1ab 92.5 - 92.9 92.6 - 93 94.2 - 94.6 88.3 - 88.8 94.6 - 95 88.7 - 98.3 89.3 - 95.2 
Spike 87.5 - 91.6 87.4-91.4 67.2 - 68.9 67.2 - 69.1 92.8 - 94.4 66.8-92.4 69.7 - 90.8 
ORF4 92.4-93.1 92.6 - 93.2 77.3 - 78.8 71.2 - 73.6 79.7 - 78.1 75.7-96.4  67.2-82.8 
Envelope 89.6 - 90.9 89.6 - 90.9 77.6 - 78.9 78.7 - 80 89.6 - 90.9 77.3 - 98.7 77.3 - 100 
Membrane 90.2 - 90.7 89.3 - 89.9 86.2 - 86.7 87.1 - 87.6 89.8 - 90.2 86.7 - 98.7 86.3 - 99.1 
Nucleocapsid 90.7 - 92 90.2 - 91.5 88.6 - 90.4 75.8 - 76.6 88.4 - 89.7 78.7 - 99.5 78.2 - 94 

ORFX/8 - - - - - 12.5 - 100 15.2 - 83.9 


“including - HCoV 229E - Inf-1, HCoV 229E - 0349, HCoV 229E - J0304 
*including - Bat CoV KW2E-F56, ATIA-F1, KW2E-F151, FOLA-F2; 


°ACOoV - Alpaca Coronavirus 
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Table 5. Putative transcription regulatory sequences of the novel bat 229E-related coronaviruses and 
HCoV-229E 


HCoV-229E/ inf-1 KW2E-F151 FOIA-F2 ATIA-FI KW2E-F56 
ae (62) UCUCAACUAAACNo) (62) UCUCAACUAAACNox9,-- (62) UCUCAACUAAACN39,- (62) UCUCAACUAAACN?»9,- (62) UCUCAACUAAACN 
(293) AUG (293) AUG (293) AUG (293) AUG (293) AUG 
Seis (20571) UCUCAACUAAAUAA (20585) UCUCAACUAAAUAA (20585) UCUCAACUAAAUAA (20576) UCUCAACUAAAAA (20570) UCUCAACUAAGUA 
P' A (20586) AUG A (20600) AUG A (20600) AUG (20589) AUG (20583) AUG 
ard (24054) UCAACUAAAN3, (24644) UCAACUAAACNyg (24638) UCAACUAAACN3, (25290) UCAACUAAACNg (25258) UCAACUAAACN 
(24101) AUG (24691) AUG (24685) AUG (25337) AUG (25304) AUG 
navel (24599) UCUCAACUAAN|5) (25190) UCUCAACUAACN yy (25184) UCUCAACUAACN 49 (25836) UCUCAACUAACN 49, (25805) UCAACUAACN) 31 
mvelope (24762) AUG (25349) AUG (25343) AUG (25992) AUG (25962) AUG 
Membrane (24991) UCUAAACUAAACG (25578) UCUAAACUAAACGA (25572) UCUAAACUAAACGA (26224) UCUAAACUAAACG (26185) UCUAAACUAAACG 


ACA (25007) AUG CA (25594) AUG CA (25588) AUG (26237) AUG (26198) AUG 


Nucleocapsid (25680) UCUAAACUGAACGA (26270) UCUAAACUGAACGA (26264) UCUAAACUGAACGA (26934) UCUAAACUGAACGA (26874) UCUAAACUGAACGA 


AAAG (25698) AUG AAAG (26288) AUG AAAG (26282) AUG AAACC (26953) AUG AAACC (26893) AUG 
ons (27468) UCAACUAAAC (27462) UCAACUAAAC (28130) UCAACUAAAC (28124) UCAACUAAAC 
(27478) AUG (27472) AUG (28141) AUG (28134) AUG 


First bracket: Genome position of the first residue of the putative TRS sequence, second bracket: genome position of the first 
base of the start codon; Njower case: number of base residues between end of the putative TRS sequence and start codon (where 


applicable) 


